Run(string,string,IList<string>,DocumentFormat,OcrProgressCallback) Method

Summary

Converts an image file on disk to a document file in the specified document format with optional multiple single zone files.

Syntax

C++/CLI

Python

public void Run( 
   string imageFileName, 
   string documentFileName, 
   IList<string> zoneFileNames, 
   DocumentFormat format, 
   OcrProgressCallback callback 
)

void Run(  
   String^ imageFileName, 
   String^ documentFileName, 
   IList<String^>^ zoneFileNames, 
   DocumentFormat format, 
   OcrProgressCallback^ callback 
)

def Run(self,imageFileName,documentFileName,zoneFileNames,format,callback):

Parameters

imageFileName
The name of the file containing the image.

documentFileName
The name of the resulting document file.

zoneFileNames
Optional list of file names for prepared zone files for the pages. This parameter can be a null (Nothing in VB) reference.

format
The output document format. If this parameter is DocumentFormat.User, then the document is saved using the native engine format set in IOcrDocumentManager.EngineFormat (assuming the engine used supports native formats); otherwise, an exception will be thrown.

callback
Optional callback to show operation progress.

Remarks

This method will perform the following operations:

Trigger the JobStarted event.
Create one or more IOcrDocument objects into which to store the pages. The number of OCR documents created depends upon MaximumThreadsPerJob. If this value is 0 (maximum CPUs/cores) or is greater than 1 and multiple threads are supported by this engine, then more than one document can be created to participate in the recognition process. The document created is disk-based.
Loop through all the pages in imageFileName, and for each page:

Create the page using IOcrEngine.CreatePage.

If zoneFileNames contains a valid zone file name for the current page (the index in this array matches the index of the page being loaded), load the zones with IOcrPage.LoadZones, applying it to the page. If zoneFileNames is a null (Nothing in VB) reference or its entry for the current page is a null reference, auto-decomposing of the page is performed instead with IOcrPage.AutoZone.

Call IOcrPage.Recognize to get the OCR data of the page.
- LEADTOOLS OCR Module - LEAD Engine: Add the page to the document using IOcrDocument.Pages.Add.
- For engines other than LEADTOOLS OCR Module - LEAD Engine: Save the current recognition data to a LEADTOOLS Temporary Document (LTD) file and clear the OCR document if multiple documents are being recognized or the current number of recognized pages is greater than the maximum specified in MaximumPagesBeforeLtd.
When all pages are processed they are saved to the resulting file name specified in documentFileName using the format specified in format. If LTD was used, the temporary file is converted to the final document using DocumentWriter.Convert and optionally DocumentWriter.AppendLtd.
Delete all OCR documents and temporary files.
Trigger the JobCompleted event.
You can use the JobProgress event or callback to show the operation's progress (if threading is used), or to abort it if threading is not used. For more information and an example, refer to OcrProgressCallback.
You can use the JobOperation event to get information regarding the current operation being performed. For more information and an example, refer to JobOperation.

The IOcrAutoRecognizeManager interface also has the following members that can be used with this method:

Option	Description
MaximumPagesBeforeLtd	Adds support for converting a document with an unlimited number of pages. An OCR recognition operation on a document that contains a large amount of pages (10 and more) can possibly result in an out-of-memory error. All LEADTOOLS OCR engines support saving the intermediate recognition results to a temporary LTD file (DocumentFormat.LTD). The result of subsequent pages will be appended to this temporary file. When all the pages of the document have been recognized, the engine will convert the temporary LTD file to the desired output format. The MaximumPagesBeforeLtd property defines the maximum number of pages processed as a whole. For example, if the original document has 20 pages and the value of this property is 8, the engine will recognize the first 8 pages and save the results to a temporary file, recognize the second 8 pages and append those results to the temporary file, and finally, recognize the last 4 pages and convert the temporary document into the final format.
PreprocessPageCommands	Holds an array of OcrAutoPreprocessPageCommand items to control what auto-preprocess operation to perform on each page document prior to recognition.
MaximumThreadsPerJob	Maximum number of threads to use per job. You can instruct IOcrAutoRecognizeManager to use all available machine CPUs/cores when recognizing a document. This will greatly reduce the time required to finish the OCR operation. The LEADTOOLS OCR Module - LEAD Engine uses the system thread pool and does not require a set number of threads. A value of 1 will disable threading and any other value will be treated as "use multi-threading".
JobErrorMode	Ability to resume on non-critical errors. For example, a critical error would be if a source document has a page that could not be recognized. The offending page will be added to the final document as a graphics image and recognition will continue to the next page.
JobStarted, JobProgress, JobOperation and JobCompleted events	Events to track when both synchronous and asynchronous jobs have started, are being run or have been completed.
AbortAllJobs	Aborts all running and pending jobs.
EnableTrace	Outputs debug messages to the standard .NET trace listeners.

Example

Java

using Leadtools; 
using Leadtools.Codecs; 
using Leadtools.Ocr; 
using Leadtools.Document.Writer; 
using Leadtools.Forms.Common; 
using Leadtools.WinForms; 
 
public void OcrAutoRecognizeManagerRun1Example() 
{ 
   string tifFileName = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.tif"); 
   string pdfFileName = Path.Combine(LEAD_VARS.ImagesDir, "Ocr1.pdf"); 
 
   // Create an instance of the engine 
   using (IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.LEAD)) 
   { 
      // Start the engine using default parameters 
      Console.WriteLine("Starting up the engine..."); 
      ocrEngine.Startup(null, null, null, LEAD_VARS.OcrLEADRuntimeDir); 
 
      IOcrAutoRecognizeManager ocrAutoRecognizeManager = ocrEngine.AutoRecognizeManager; 
 
      // Recognize the document 
      ocrAutoRecognizeManager.Run(tifFileName, pdfFileName, null, DocumentFormat.Pdf, null); 
   } 
} 
 
static class LEAD_VARS 
{ 
   public const string ImagesDir = @"C:\LEADTOOLS23\Resources\Images"; 
   public const string OcrLEADRuntimeDir = @"C:\LEADTOOLS23\Bin\Common\OcrLEADRuntime"; 
}

 
import java.io.File; 
import java.io.FileNotFoundException; 
import java.io.FileWriter; 
import java.io.FilenameFilter; 
import java.io.IOException; 
import java.nio.file.Files; 
import java.nio.file.Path; 
import java.nio.file.Paths; 
import java.util.ArrayList; 
import java.util.concurrent.ExecutorService; 
import java.util.concurrent.Executors; 
import java.util.concurrent.atomic.AtomicInteger; 
 
import org.junit.*; 
import org.junit.runner.JUnitCore; 
import org.junit.runner.Result; 
import org.junit.runner.notification.Failure; 
 
import static org.junit.Assert.*; 
 
import leadtools.*; 
import leadtools.document.writer.*; 
import leadtools.internal.AutoResetEvent; 
import leadtools.ocr.*; 
 
 
public void OcrAutoRecognizeManagerRun1Example() { 
   String LEAD_VARS_ImagesDir = "C:\\LEADTOOLS23\\Resources\\Images"; 
   String LEAD_VARS_OcrLEADRuntimeDir = "C:\\LEADTOOLS23\\Bin\\Common\\OcrLEADRuntime"; 
   String tifFileName = combine(LEAD_VARS_ImagesDir, "Ocr1.tif"); 
   String pdfFileName = combine(LEAD_VARS_ImagesDir, "Ocr1.pdf"); 
 
   // Create an instance of the engine 
   OcrEngine ocrEngine = OcrEngineManager.createEngine(OcrEngineType.LEAD); 
 
   // Start the engine using default parameters 
   System.out.println("Starting up the engine..."); 
   ocrEngine.startup(null, null, null, LEAD_VARS_OcrLEADRuntimeDir); 
   assertTrue("OCR Engine unsuccessfully started", ocrEngine.isStarted()); 
 
   OcrAutoRecognizeManager ocrAutoRecognizeManager = ocrEngine.getAutoRecognizeManager(); 
 
   // Recognize the document 
   ocrAutoRecognizeManager.run(tifFileName, pdfFileName, DocumentFormat.PDF, null); 
   ocrEngine.dispose(); 
}

Requirements

Target Platforms

Reference

IOcrAutoRecognizeManager Interface

IOcrAutoRecognizeManager Members

Overload List

Programming with the LEADTOOLS .NET OCR

Multi-Threading with LEADTOOLS OCR

Download our FREE evaluation

Help Version 23.0.2024.4.19

Leadtools.Ocr Assembly

Introduction

Getting Started

Namespaces

Leadtools.Ocr Namespace

Assemblies